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SUMMARY 

Information on the incidence of Chlamydia trachomatis (CT) is essential for models of the 
effectiveness and cost-effectiveness of screening programmes. We developed two independent 
estimates of CT incidence in women in England: one based on an incidence study, with estimates 
'recalibrated' to the general population using data on setting-specific relative risks, and allowing 
for clearance and re-infection during follow-up; the second based on UK prevalence data, and 
information on the duration of CT infection. The consistency of independent sources of data on 
incidence, prevalence and duration, validates estimates of these parameters. Pooled estimates of 
the annual incidence rate in women aged 16-24 and 16^44 years for 2001-2005 using all these 
data were 0-05 [95% credible interval (CrI) 0-035-0-071] and 0-021 (95% CrI 0-015-0-028), 
respectively. Although, the estimates apply to England, similar methods could be used in other 
countries. The methods could be extended to dynamic models to synthesize, and assess the 
consistency of data on contact and transmission rates. 



Key words: Bayesian analysis, Chlamydia, evidence synthesis, incidence, multi-parameter evidence 
synthesis. 



INTRODUCTION 

About 110 000 cases of Chlamydia trachomatis (CT) 
were diagnosed in women in England in 2009 [1]. 
However, CT infection is often asymptomatic and 
undiagnosed, which is one of the key motivating fac- 
tors for screening. Dynamic models of disease trans- 
mission are commonly used to assess the potential 
impact of screening and its cost-effectiveness [2-4], 
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and these models need to be consistent with observed 
information on the age-specific incidence of infection. 

One study that provides estimates of CT incidence 
in England has been published [5]. Women aged 
16-24 years were screened for Chlamydia in General 
Practitioner (GP), Family Planning (FP), and Sex- 
ually Transmitted Disease clinic (STD) settings in 
two areas in England, and were followed prospectively 
at 6-month intervals for 6-18 months to assess CT 
infection and re-infection. However, this study is 
restricted to clinic patients and it does not address 
incidence in the English general population. In 
addition, due to the interval-censored observations 
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in the study, it is possible for women to have both 
acquired and cleared the infection during the periods 
between observations, leading to under-estimation of 
incidence. This was not accounted for in the original 
paper [5], 

This paper sets out to produce a set of age-group- 
specific estimates of CT incidence in the general popu- 
lation of women in England based on all available 
evidence. There are several novel aspects in our 
approach. First, we re-analysed the data from the 
LaMontagne et al. [5] incidence study accounting for 
CT clearance, informing clearance rate from a recent 
synthesis of the duration of asymptomatic infection. 
Second, we use information on setting-specific pre- 
valence [6] of CT in the UK, which included estimates 
in the general population as well as in GP, FP, and 
STD settings, to 'recalibrate' the estimated incidence 
rates from LaMontagne to the general population set- 
ting. Third, we exploit the well known epidemiological 
relationship: prevalence = incidence x duration to gen- 
erate an independent set of incidence estimates based 
on prevalence and duration data. This provides a 
degree of independent validation for the estimates 
obtained directly from the incidence study. Finally, 
we produce a coherent set of estimates of age-specific 
incidence and prevalence, and duration in women 
in the general population that both, conform to the 
appropriate epidemiological relationships, and are 
based on all the available data. This is an application 
of multi-parameter evidence synthesis [7, 8] to 
Chlamydia epidemiology. 

METHODS 

Multi-parameter evidence synthesis 

Multi-parameter evidence synthesis (MPES) is a 
method for estimating models by statistically combin- 
ing all the available information on model parameters 
and functions of parameters [7, 8]. The uncertainty in 
the data inputs is taken into account and propagated 
through the model. In MPES parameters are defined 
as basic or functional. The model is fully specified 
by the basic parameters [9]. All functional parameters 
can be written as functions of these basic parameters. 
They are important either because some data informs 
a functional parameter, or because the distribution 
and summary statistics for the functional parameter 
are of interest. 

Data are available on incidence, prevalence, and 
duration, and also on risk factors. Because there is 
information on more functions of parameters than 



there are parameters, it is possible to assess the con- 
sistency of the evidence. A schematic directed acyclic 
graph (DAG) (Fig. 1) shows the relationships between 
the sources of data and the model parameters, and 
spells out their mathematical form. The data sources 
are shown in clear rectangles, and informative priors 
in light grey rectangles. Basic parameters are shown 
in shaded ellipses, and functional parameters in clear 
ellipses. All basic parameters that do not have an 
arrow pointing to them from an informative prior 
have uninformative priors which are not shown on 
the diagram. We estimated the models using the 
Bayesian Markov Chain Monte Carlo (MCMC) 
package WinBUGS [10]. With WinBUGS software 
the user needs to specify the prior distributions on 
the basic parameters, to specify the likelihood for 
each of the data observations, and specify the math- 
ematical relations, as shown in the figure, that link 
them. Full details of the statistical model are given 
in Appendix 1. 

Models and data sources 

An attempt was made to identify data sources on inci- 
dence and prevalence of CT in the UK. A formal sys- 
tematic review was not conducted, but papers were 
identified from recent reviews and synthesis exercises. 
Only one published report on incidence was identified 
[5], and a recent synthesis of UK CT prevalence data 
was also used [6]. Information on CT duration was 
based on a recent synthesis [11] described below. 
The information in Tables 1^4 represents all the infor- 
mation incorporated in the synthesis. Below we set out 
the assumptions that were made about the processes 
that generated the data, and the main features of the 
synthesis model. We begin by discussing the duration 
of CT infection which is required for all subsequent 
analyses. 

Duration of CT infection 

The mean duration of infection, A, can be expressed 
as a weighted average of the length of asymptomatic 
(untreated) infection A A and symptomatic (treated) 
infection A s , 

A = A^ + A A .(1 -<p), (1) 

with <p being the proportion of incident infections in 
which symptoms develop. In the Discussion section 
we show that results would have been similar 
if we considered durations of treated and untreated 
infections instead. 
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Setting-specific prevalence 
ratios (Adams [6]) 



Proportions infected, re-infected by 
age and setting (LaMontagne [5]) 



Infection/Re-infection weights 
(LaMontagne [5]) 




Duration 




synthesis, 




Price [11]) 





Fig. 1. Directed acyclic graph (DAG) of the evidence network. The data sources are shown in clear rectangles, and 
informative priors in light grey rectangles. Basic parameters are shown in shaded ellipses, and functional parameters in 
clear ellipses. The arrows show the direction of flow. Light arrows point to data, heavy arrows indicate functional 
relationships between parameters, which are set out as equations. Data may be available to provide evidence on either 
basic or functional parameters. All basic parameters that do not have an arrow pointing to them from an informative 
prior have uninformative priors which are not shown on the diagram. The 'basic' parameters are: A^i,] the infection rate 
in age group 1, setting 1; y a the hazard ratio for infection in age group a relative to group 1 (age 16-17 years); p s the 
hazard ratio for infection in setting s relative to setting 1 (GP setting); tj s the setting-specific reinfectiominfection rate ratio; 
Ph,gp the proportion of patients at recruitment in the GP attenders in age group a in the LaMontagne study that were in 
the re-infection group reweighted to account for differential recruitment; A A and A Sj the durations of asymptomatic and 
symptomatic infection; g> the proportion of incident infections in which symptoms develop. Functional parameters are: 
X A , X s clearance rates of asymptomatic and symptomatic infection; /l a s i incidence in age a, setting s, for infections (i=l) 
and re-infections (i=2); K(;) ajS ,i proportion infected in that group after t years (for the LaMontagne study ? = 0-5); 

l a p' p the force of infection and the incidence rate in the general population; A the average duration of infection, 

^a.pop, the general population prevalence at age a is either a basic or functional parameter depending on whether separate 
incidence estimates (methods A and B are performed in parallel) or the full synthesis model is being used. The black bar 
indicates where the network can be cut to obtain separate estimates of incidence. (Further explanation is given in the text 
and the statistical Appendix). 



For the duration of asymptomatic CT infections, 
A A , we use an estimate of 1-36 (95% CrI 11 1-1-62) 
years, based on a previous evidence synthesis of 
studies on CT duration in asymptomatic women 
[11]. This was a synthesis of nine studies identified 
from recent reviews [12-14], four that recruited 
asymptomatic infected women in STD clinic settings, 
and five studies based on population screening. Evi- 
dence was presented that these approximately rep- 
resented incident and prevalent infections, respect- 



ively. The authors fitted mixtures of exponential mod- 
els to these data. The estimates used here (Table 1) 
were based on a model that assumed CT infections 
clear at a constant rate. 

Studies of CT duration have the inherent limitation 
that patients may clear infection and be re-infected 
during the follow-up period. For this reason we con- 
sider same-partner re-infections, which microbiological 
evidence suggest comprise the great majority of re- 
infections [15], to be part of a continuous episode. 
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Table 1. Data on duration (years) of Chlamydia trachomatis infection 



Parameter Mean (95% CrI) Source 

Duration of asymptomatic infection 1-36 (1-1 1 1 -62) Price et al. [11] 

Duration of symptomatic infection 0-115 (0-079-0- 151) See text 

Duration of symptomatic infection (sensitivity analysis) 0-144 (0-062-0-227) See text 

Proportion of incident infections in which symptoms develop 0-231 (0-159-0-311) Geisler et al. [16] 

CrI, Credible interval. 



Table 2. Data derived from tables 2 and 4 from 
LaMontagne et al. [5] on infection and re-infection 
rates per 100 women years: numerators r and 
denominators n 



Setting 


Age (yr) 


Infection 




Re-infection 




r 


Rate 


n* 


r 


Rate 


/;* 


GP 


















16-17 


4 


11-2 


73 


5 


86-2 


14 




18-20 


3 


3-1 


195 


7 


22-8 


65 




21-24 


4 


4-3 


188 


10 


26-9 


79 


FP 


















16-17 


9 


9-5 


194 


13 


29-4 


95 




18-20 


5 


3-7 


273 


12 


19-9 


127 




21-24 


7 


7-1 


201 


5 


16-6 


63 


STD 


















16-17 


5 


10-1 


102 


6 


32-3 


40 




18-20 


16 


14-1 


235 


15 


22-8 


139 




21-24 


9 


7-5 


245 


5 


12-8 


81 



GP, General practitioner; FP, family planning; STD, 
sexually transmitted disease clinic. 

* n is estimated as the total number of 6-month follow-up 
periods (events were assumed to happen halfway between ob- 
servations when the rates were estimated in LaMontagne). 
This has been calculated from the reported rates and num- 
bers of events. 

The proportion of CT infections, <p, in which symp- 
toms develop can be estimated from studies where 
asymptomatic women within a few days of exposure 
are followed without treatment to determine if symp- 
toms develop, and we have interpreted studies of 
asymptomatic women attending for STD testing as 
studies of this type. This interpretation is supported 
by the synthesis of studies on CT duration described 
above [11]. We identified only one such study report- 
ing the proportion of incident CT in which symptoms 
develop [11]. This found that 26 out of a total of 1 15 
women developed symptoms, estimating <p at 23% 
(95% CI 16-31) [16]. 

Duration of symptomatic infection, A s , is defined as 
the time between the point at which the patient 



Table 3. Estimated prevalence of Chlamydia 
trachomatis in females in the general population 
reported in table 4 in Adams et al. [6] 



Age (yr) 


Prevalence (95% CI) 


18-19 


0-048 (0-032-0-076) 


20-24 


0-032 (0-021-0-049) 


25-29 


0-015 (0-010-0-025) 


30-44 


0-008 (0-005-0-013) 



CI, Confidence interval. 



Table 4. Reported adjusted odds ratios for the effect of 
setting on Chlamydia prevalence in females in the UK, 
from table 3 in Adams et al. [6 J 



Setting 


OR (95% CI) 


General population vs. GP 


0-6 (0-37-0-95) 


FP vs. GP 


1-27 (1-00-1-62) 


STD v*. GP 


2-39 (0-72-3-33) 



OR, Odds ratio; CI, confidence interval; GP, general prac- 
titioner; FP, family planning; STD, sexually transmitted dis- 
ease clinic. 



becomes infected, and the point at which the infection 
is diagnosed, or the patient is empirically treated and 
the infection is cleared. This could be derived from 
information on the incubation period of CT and 
studies of time taken to seek healthcare in women 
subsequently diagnosed with CT. A recent literature 
search [12] found no data on incubation period, and 
although there were studies of time to seek healthcare 
in women with genital symptoms, specific information 
on those diagnosed with CT was not found. We have 
placed an informative prior on the time from infection 
to diagnosis assuming it is uniformly distributed 
between 4 and 8 weeks, and that once diagnosed a 
woman would not participate in a prevalence survey. 
We assess sensitivity to this by fitting a model where 
the duration of symptomatic infection varies uniformly 
from 3 to 12 weeks. 
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CT incidence data 

The only study of CT incidence in England is 
LaMontagne et al. [5]. Women aged 16-24 years 
were screened for Chlamydia in GP, FP, and STD 
settings in two areas in England in 2003-2004, and 
were followed prospectively at 6-month intervals for 
6-18 months to assess CT infection and re-infection. 
A ligase chain reaction (LCR) test was used, for 
which we assumed 100% sensitivity and specificity. 
Women found positive were treated. Table 2 gives 
the proportions of 6-month-long observations in 
which CT-negative women were CT positive on 
follow-up. These are divided into 'infections' and 
're-infections': the latter being infections observed in 
women who were CT positive on recruitment or 
were infected during the follow-up period. The data 
are reported for age groups (a= 1, 16-17 years; a = 2, 
18-20 years; a = 3, 21-24 years). 



Regression model to estimate infection and re-infection 
rates by age and setting 

We model the infection rates as a function of a base- 
line infection rate multiplied by the between 
setting hazard ratios p s and the between age-group 
hazard ratios y a . The age- and setting-specific 
re-infection rates l aA2 equal the respective infection 
rate multiplied by a setting-specific re-infection hazard 
ratio r] s [equation (2)]. Other regression models are 
considered in Appendix 2: 

•ks.l = yaA^lll and ^a,s,2 = >/s4,s.l, (2) 

The infection rates in equation (2) are 

informed by the data in Table 2, which shows the 
number of initially uninfected women in each age 
and setting who were found to be infected after 
a 6-month follow-up period. However the mathemat- 
ical relationship between the infection rates in 
each group and the proportions infected K(/)a,s,i at 
the end of a period of time length t is complex. 
The formula shown in Figure 1 allows for the fact 
that in the LaMontagne data it is possible for a 
woman to clear infection spontaneously or through 
treatment, and then re-acquire infection within 
the 6-month follow-up. It is necessary therefore to 
take account of the clearance rates of symptomatic 
and asymptomatic infection, and the proportion of 
incident infections that become symptomatic (see 
Appendix 1). 



Estimation of force of infection (FOI) 

The infection and re-infection rates can be used to esti- 
mate the mean FOI, A£g , in the CT-negative women 
in each setting and age group using equation (3): 

^ = (1 -^a,s)^a,s,l +Pz,sX,s,2, ( 3 ) 

where the weights are given by the prevalence 
of CT in each setting observed in the LaMontagne 
study (Appendix 1). However, as the LaMontagne 
study only samples from GP, STD, and FP settings it 
is necessary to turn to a third source of evidence, CT 
prevalence, to map these estimates of FOI to estimates 
for the general population. 

CT prevalence 

CT prevalence varies by age and setting. Table 3 shows 
estimates of CT prevalence by age in the general popu- 
lation from a logistic regression of UK prevalence 
studies [6] identified by a systematic review in 2004. 
These data inform the absolute prevalence in 18- to 
19-year-olds, ?r ljPop (the youngest age group in the 
study), and the relative risk RR a of infection in the 
generic age group a relative to age 18-19 years so that: 

^a.pop — ^l,pop-RRa- 

Table 4 shows prevalence odds ratios for the different 
settings FP, STD clinics, and general population 
settings (pop), relative to the General Practice (GP) 
setting, from the same study: these are used to 
inform setting-specific relative risks (RRs). The inter- 
pretation of odds ratios as relative risks is an approxi- 
mation that is justified by the rarity of the disease 
[17]. Other prevalence data have been collected sub- 
sequently [18, 19], but have not been incorporated 
due to doubts about the national representativeness. 

In order to use these data to map our estimates of 
FOI to the general population we make the assump- 
tion that the between-setting and between age-group 
relative risks in the prevalence data directly inform 
the hazard ratios (y a and p s ) in the incidence model 
described above. For a fixed duration, prevalence 
ratios must be equal to incidence rate ratios, so the 
assumption is that ratios of incidence are equivalent 
to ratios of FOI, and of infection. We consider this 
assumption further in Appendix 2. 

A first estimate of CT incidence in England (method A) 

We use the odds ratio between the general popula- 
tion setting and the GP setting, which informs the par- 
ameter p p0 p to map the FOI in the GP setting 
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to provide an estimate of the FOI in women in the 
general population 9* : 

(4) 



JFOI _ i 
a, pop f mnA 



FOI 

pop /l a,GP 



Estimates of FOI are of interest in themselves. 
However, we can easily calculate the annual popu- 
lation incidence rate ^'^op (years -1 ) for age groups 
16-17, 18-20, and 21-24 years as a function of FOI 
(years -1 ) and duration using equation (5): 



INC1 

a, pop 



JFOI 
a, pop 

-1FOI A " 
"■a, pop" 



(5) 



A second estimate of CT incidence in England 
(method B) 

A second estimate of the annual population incidence 
(years -1 ) can be obtained using data on duration 
and data on prevalence using the relationship: 
incidence = prevalence/duration, so that: 



JINC2 _ /L a. pop 
a, pop a ■ 



(6) 



Where duration is estimated as previously described 
and prevalence 7r ajPop is informed directly by the 

data in Table 1 so 1^1™ * s estimated for the groups 
18-19, 20-24, 25-29, and 30^14 years. 

Full synthesis model 

We can combine both of the above analyses in a single 
joint synthesis using the relationship: 



^a.pop 



riNC 

a, pop" 



(7) 



where is informed as described in method A, 

and the parameters 7r ajPop and A are informed as 
described in method B. This is shown in the DAG 
in Figure 1. This single joint analysis provides esti- 
mates of population incidence for age groups 16-17, 
18-20, 21-24, 25-29, and 30^14 years. The only age 
groups for which incidence is estimated in both 
methods A and B are 18-20, and 21-24 years. 
However, estimates for the other age groups are also 
expected to change. The full synthesis model provides 
estimates for all parameters based on the entire data 
ensemble. So, for example, when our knowledge of 
the regression parameters described in method A are 
updated by the data described in method B, estimates 
of the annual population incidence rate in 16- to 
17-year-olds may change. 

Note that the DAG in Figure 1 also describes 
methods A and B above. We remove the constraint 



that prevalence = incidence x duration shown on the 
DAG under the heavy black bar replacing it with 
equation (6) above, and place uninformative priors 
on 7r aj p 0 p. This single unconstrained model then pro- 
duces estimates from both methods A and B in 
parallel. 

Statistical estimation and model critique 

The full specification of the model is set out in 
Appendix 1 . Estimation was pereformed using a 
Bayesian approach, where the posterior distribution 
was sampled through MCMC implemented in the 
WinBUGS package version 1.4.3 [10]. The Bayesian 
approach was taken because of its flexibility in pooling 
information on complex functions of parameters: we 
would expect similar results from a frequentist ap- 
proach. MCMC estimation is performed by drawing 
thousands of samples from the joint posterior distri- 
bution. The first 50000 iterations were discarded: this 
was the 'bum-in' period to ensure that the distributions 
had converged to the posterior. The Brooks-Gelman- 
Rubin statistic [20] demonstrated convergence of 
all parameters to their posterior distribution after 
at most 25000 samples. The results reported below 
are summary means and credible intervals of the 
marginal distributions from this joint posterior based 
on 200000 samples from each of two chains. 

To assess goodness of fit, we used the posterior 
mean residual deviance, which should approximate 
to the number of data points under the assumption 
that the model is true [21, 22]. We compared the good- 
ness of fit of the combined synthesis model and the 
model with separate incidence estimates: this provides 
a direct assessment of the statistical assumptions. 
A graphical comparison of the separate incidence 
estimates is also presented. We assessed the validity 
of some more specific statistical assumptions in 
Appendix 2. Unless otherwise stated vague priors 
are employed throughout, so that results are domi- 
nated by the data. The WinBUGS code is available 
along with the datasets as Supplementary online 
material. It has been annotated to help readers to 
understand the model. 

RESULTS 

Table 5 shows the posterior estimates of the annual 
population incidence from method A (column 2), 
method B (column 3), and the full synthesis model 
(column 4). Estimates from method A are available 
for the 16-17, 18-19, and 20-24 years age groups, 
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Table 5. Population Chlamydia trachomatis incidence rate ( years ) in women by age estimated using each method 



Parameter 


Method A (adjusted 
incidence data) 


Method B (prevalence 
and duration data) 


Full synthesis model 
(all data) 


^h?- 17 pop> P°P n incidence, 16-17 yr 
^1^-19 p 0 p, popn incidence, 18-19 yr 
X ™5 24 pop , popn incidence, 20-24 yr 
^-29 pop' P°pn incidence, 25-29 yr 
^30-44,pop, P°P n incidence, 30^14 yr 


0-122 (0-057-0-235) 
0-070 (0-036-0-126) 
0-060 (0-032-0-106) 
n.a. 
n.a. 


n.a. 

0-046 (0-028-0-072) 
0031 (0019-0048) 
0-015 (0-009-0-023) 
0-0078 (0-0045-0-013) 


0-082 (0-047-0-134) 
0-048 (0-032-0-068) 
0-039 (0-027-0-054) 
0-015 (0-0087-0-024) 
0-0080 (0-0045-0-013) 



n.a., Not available. 

Results given are mean (95% credible interval). 



(a) 

0-25 -i 




0 002 004 006 008 0-10 0 12 0-14 0-16 0-18 0-20 



Chlamydia trachomatis incidence 18-19 years 




Fig. 2. Marginal posterior distributions of incidence parameters, comparing results based on the information in incidence 
study (LaMontagne), with results based on information in prevalence and duration studies (alone), and with results based 
on pooling all sources of information (a) age group 18-19 years, (b) age group 18-20 years. 



method B provides estimates for the 18-19, 20-24, 25- 
29, and 30-44 years age groups, and estimates for all 
age groups are available from the full synthesis model. 
Estimates from the full synthesis model are around a 
factor of 1-5 lower than those obtained from method 
A, but only marginally higher than those obtained 
from method B. This is because the uncertainty in 
the incidence information from the LaMontagne 



data is much greater than in the combined duration 
and prevalence information. This effect is shown 
graphically in Figure 2, which compares the estimates 
of incidence in the 18-19 and 20-24 years age groups, 
and also shows the combined estimate incorporating 
all data sources. Results from the full synthesis 
model for all five age groups are also given in 
Figure 3. 
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Table 6. Parameter estimates obtained from the separate models fitted using methods A and B in parallel ( column 2), 
and from the full synthesis model ( column 3 ) 





Separate models 


Full synthesis model 


Parameter 


Mean (95% CrI) 


Mean (95% CrI) 


C~"T Hnration anH plparanpp ratp 






plparanpp ratp asvmrvfomatip 


0-74 fO-62-0-901 


0-77 10-64-0-951 


A mpan Hnration fvpnrsi 


1 -07 (T)-8fi-1 


1 -03 fO-87-1 -75"! 


<p s proportion symptomatic 


0-23 (0-16-0-31) 


0-23 (0-16-0-32) 


C~"T inpiHpTipp - rpcrrp^ion naramptprs 






Hon rp-iiifpptiorriiifpptiori ratio OP 

*/ (j Pi 1 V< 1111 V. A_ Llv'11.1111 V. A_- Llv/11 J. Ullu; V J ± 


7-31 (4--07-1 1-91 

/ J 1 \^ Vy / L L y J 


7-08 H-97-1 1-61 

/ uu y^j y t i i \j j 


ij rT1 rp-infpptiorririfpptiori ratio PP 

/ / 1 ■ I >> 1 V_ llllCv-llVll. 1 1 1 1 C V_ I 1 V 1 1 1 il IK'. 1 1 


3-52 (2-09-5-521 


3-66 12- 16-5-77") 


fioTi-. rp-infpptiori 'infpptiori ratio STD 

/SI Ly 1 '-1^1 vv 11W11.1111\_C LlV^li. L CI Ll ■ k.J 1 


2-01 fl-17-3-171 


2-08 (1 -21-3-281 


/ji-r, 1ia7arH ratio OP (Ypfprpnpp proiin^ 


1 


1 


p p0 p, hazard ratio, general population 


0-62 (0-37-0-96) 


0-46 (0-33-0-63) 


p FP , hazard ratio, FP 


1-28 (1-02-1-59) 


1-30 (1-03-1-61) 


Pstt>> hazard ratio, STD 


2-38 (1-78-3-11) 


2-45 (1-83-3-20) 


CT prevalence,% 






xib-n.pop, general population, 16-17 yr 


n.a. 


8-38 (4-94-13-5) 


^i8-i9,po P , general population, 18-19 yr 


4-91 (3-13-7-28) 


4-85 (3-47-6-59) 


it 2o-24, P op, general population, 20-24 yr 


3-27 (2-10^-83) 


3-96 (2-89-5-30) 


n 25-29, P op, general population, 25-29 yr 


1-54 (0-95-2-35) 


1-54 (0-95-2-37) 


^30^4, pO p: general population, 30^-4 yr 


0-82 (0-50-1-28) 


0-83 (0-50-1-29) 


CrI, Credible interval; CT, Chlamydia trachomatis; n.a. 


not available; GP, general practitioner; FP, family planning; STD, 



sexually transmitted disease. 



0-6 n 



16-17 

18-19 




Chlamydia trachomatis incidence 



Fig. 3. Posterior distribution of incidence, by age range, based on all available information. 



Table 6 shows the estimates of the basic parameters 
in the model estimated when the constraint that 
prevalence = incidence x duration is excluded (col- 
umn 2) and included (column 3) in the model, rep- 
resenting respectively, methods A and B being 
performed simultaneously in parallel, and the full 
synthesis model. It shows that for most parameters: 
duration, proportion symptomatic, re-infection: 



infection rate ratios, age- and setting-specific risk 
ratios, and prevalence parameters, the synthesis 
has not contributed much additional information 
over and above the 'direct' data already available. 
However, the general population-to-GP relative risk 
is lowered by a factor of about 1-35 compared to 
method A and the 95% credible intervals are about 
half the width. 
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Table 7. Model fit statistics for each dataset from the separate models fitted using methods A and B in parallel and 
the full synthesis model 



Source 


Number of 
data points 


Number of parameters 




Mean residual deviance 




Separate models 


Full model 


Separate models 


Full model 


Incidence 


21 


9 


9 


18-4 


19-5 


Prevalence 


4 


4 


2 


4-0 


4-1 


Duration 


2 


2 


2 


2-0 


2-2 


Total 


27 


15 


13 


24-3 


25-7 



Our separate models of incidence in women aged 
16-24 years included nine parameters and had a 
residual deviance of 18-4 for a dataset with 21 data 
points (Tables 2 and 3) representing a good fit 
(Table 7). When these data are combined with the 
prevalence information, residual deviance increases 
only marginally (19-5), indicating a lack of conflict 
between the different sources of information on inci- 
dence. Prevalence and duration data also fitted 
equally well. Results (not shown) with a wider uni- 
form prior distribution on the duration of sympto- 
matic infection, 3-12 weeks rather than 4-8 weeks, 
were almost identical (<1% multiplicative change). 
We therefore recommend using results from the full 
synthesis model that uses all of the data, giving an esti- 
mated incidence rate in females aged 16-24 years, 
the population targeted by the National Chlamydia 
Screening Programme, of 0-05 per year (95% CrI 
0-035-0-071), and in females aged 16^14 years 0-021 
per year (95% CrI 0-015-0-028). 

DISCUSSION 

While CT prevalence in the general UK population has 
been studied [6], incidence estimates have only been pro- 
duced in clinic patients [5]. We used data on ratios 
between clinic settings and the general population in 
prevalence to 'recalibrate' the incidence data to a 
lower value appropriate to a general population setting. 
We were able to show that three independent and separ- 
ate sets of data on prevalence, incidence and duration 
were all consistent with each other, under a model 
which captured the logical relationships between these 
parameters. The possibility of clearance of infection 
and re-infection during the follow-up period was also 
taken into account: the effect of this is to raise incidence 
estimates above the levels that are directly observed. 
The estimate based on the recalibrated incidence study 
was found to be compatible with an estimate based 
on combining prevalence and duration information. 



A certain degree of simplification is involved. The 
incidence data was collected in 2003-2004, 2-3 years 
later than the NATSAL study [23], which contributes 
all the general population prevalence information to 
the estimates in the Adams study [6], and in a period 
before intensive screening was taking place. We have 
assumed that incidence is unlikely to have changed 
greatly between these dates, and that our estimates 
are therefore relevant to the years 2001-2005. 

The application of Bayesian evidence synthesis 
methods to CT epidemiology can shed light on the 
value of different study designs and the relationships 
between them. In addition, the ability to confirm, for 
example, that setting- and age-specific risk ratios in 
an incidence study are compatible with odds ratios in 
prevalence studies, makes it a valuable approach to epi- 
demiology. However, as with any evidence synthesis 
method, conclusions are limited by the quality of the 
original data and the assumptions made in interpreting 
them. The CT prevalence information in NATSAL 
was based on self-testing in a structured population 
sample and is vulnerable to response biases, although 
these have been extensively analysed elsewhere [24]. 
The incidence data were collected in two English 
areas, which were metropolitan and urban. The extent 
to which these data can be assumed to be nationally 
representative is not known. Finally, the estimates of 
duration of asymptomatic CT duration, 1-36 (95% 
CrI 1-11-1 -62) years, were based on an earlier synthesis 
of studies with different designs [11]. The estimate 
assumes a constant clearance rate, and the model did 
not allow for re-infection. However, a model including 
fast and slow clearers provided no improvement in 
residual deviance, and the bias introduced by not 
accounting for CT re-infection in duration studies is 
far lower than the bias introduced by not accounting 
for clearance in studies of incidence. These findings 
are supported by a recent analysis by Althaus et al, 
[25] who fitted a reversible model to data extracted 
using the Kaplan-Meier curve from the Molano et al. 



Incidence of Chlamydia in England 571 



study [15]. They showed that a single rate model pro- 
vided a good fit and found that allowing for 
re-infection had almost no impact on estimates of 
duration. A further key assumption in [11] was that 
clinic-based studies on asymptomatic women could 
effectively be interpreted as studies of incident in- 
fection, while studies of population screening were 
picking up prevalent infection. The authors cited 
several external evidence sources supporting this 
assumption. 

Additional validation of our estimates and of our 
overall approach is available by multiplying our esti- 
mated incidence rate by the number of women aged 
16-24 years in England based on population census 
projections for 2002 [26]. This predicts a total of 
137100 (95% credible interval 95 520-192 500) infec- 
tions. This can be compared to the 31510 and 
34660 women aged 16-24 years who were treated 
for CT in STD clinics in 2002 and 2003 respectively 
[1]. The ratios of numbers treated to predicted total 
infections in women aged 16-24 are 24% (95% Crl 
16-33) for 2002 and 26% (95% Crl 18-36) for 2003. 
This accords closely with the proportion of infections 
in which symptoms develop estimated from the model, 
and the Geisler et al. [16] findings. Therefore, had we 
partitioned women as treated or untreated when esti- 
mating the mean duration as is often done in dynamic 
models and used recursive equations to estimate the 
proportion treated from routine data, we would have 
obtained almost identical results. 

Although estimates of CT incidence and prevalence 
in England may be of limited interest elsewhere, the 
study does have wider implications. First, the fact 
that incidence, prevalence and duration evidence is 
internally consistent provides a degree of independent 
validation of our estimates of all three parameters. 
Second, it indicates that estimates of CT prevalence 
or incidence in other countries can each be generated 
from the other, using our estimates of duration. 
Alternatively, where information is available on both 
incidence and prevalence, a similar exercise could be 
carried out to provide a further validation of our 
results and the models on which they are based. 

The study raises the question: what is the best way to 
obtain accurate population-based estimates of CT inci- 
dence? Further direct study of infection and re-infection 
rates in opportunistically recruited women appears 
to be worthwhile. However, as well as taking account 
of clearance and re-infection during follow-up, it 
will probably always be necessary to 'recalibrate' 
setting-specific estimates to the general population. 



Studies of either prevalence or incidence based on struc- 
tured general population surveys are, therefore, 
essential. 

Our analysis of incidence, prevalence and duration 
has relied on an essentially static epidemiological 
model. The alternative would be to assess the consist- 
ency of a somewhat wider evidence base within a 
dynamic modelling context. For example, a dynamic 
model could be estimated from the same sources of 
data (incidence, prevalence, duration of symptomatic 
and asymptomatic infection, proportion sympto- 
matic), but also incorporating information on contact 
rates and transmission rates per contact. Dynamic 
modelling is not normally conceived as a synthesis 
exercise: more often, incidence is seen as an 'output' 
of a dynamic model. However, the feasibility of an 
evidence synthesis and consistency checking approach 
to dynamic models has already been established [27]. 
This kind of approach would lead to a further 
extension to incorporate information on incidence, 
prevalence and duration of CT incidence in men. 

APPENDIX 1. STATISTICAL METHODS 

The schematic influence diagram (Fig. 1) sets out all 
the relationships between model parameters and 
data in mathematical terms. Basic parameter nodes 
are shown as shaded ellipses and functional parameter 
nodes as clear ellipses. Two of the basic parameters 
are given informative prior distributions shown in 
shaded rectangles with arrows pointing from the rec- 
tangle and to the parameter. The remaining 'basic' 
parameters are given vague prior distributions which 
are not shown on the figure. The 'functional', par- 
ameters are defined in terms of basic parameters, and 
the definitions are shown in equations. Data which 
are entered as a likelihood are shown in clear rectangles, 
with arrows pointing from the parameter to the data. 
A full list of basic and functional parameters along 
with brief descriptions is provided in Table Al. 

Most of the functional relationships have been 
spelled out in the methods section as equations 
(l)-(7), or in the DAG. Some expressions require 
further explanation. 

The expression: 



relates the proportion of infected individuals ic(i), who 
were initially uninfected, observed after time t, to an 
incidence rate 1 and a clearance rate l c . This can 
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be derived from Kolmogorov's forward equations 
[28, 29]. In Figure 1, the more complex relationship 
represents a weighted average of two clearance 
rates: one being in symptomatic and the other in 
asymptomatic women. The proportion of infections 
that develop symptoms is the weight, and the clear- 
ance rates in each group are the reciprocals of the 
mean duration of symptomatic and asymptomatic 
infections. 

Strictly speaking, Figure 1 sets out the relationships 
as they would be if the incidence and prevalence data 
were available on the exact same age groups. As the 
age groupings in the studies were slightly different, 
we used census information on the English female 
population sizes from 2002 for each year of ages 
16^14 years to reweight the parameters. Readers can 
see what was done from the WinBUGS code provided 
as Supplementary material, which is annotated to 
make all these adjustments clear. 

Prior distributions 

Vague normal priors were placed on the log incidence 
rate in the LaMontagne study in age group 1 and GP 
setting: ln(/L liGP1 ) ~ 7V(0,100 ), and also on the rate 
ratios p s for setting s relative to the GP setting, and 
y a for age group a relative to the 16-17 years age 
group, and for the ratios r/ s in re-infection rate to 
infection rate in setting s: p a , y s , // s ~ 7V(0,100 2 ). 

Priors for the duration of infection and pro- 
portion symptomatic were as follows: proportion 
symptomatic ^~beta (1,1); A A ~« (0, 100 2 ), A s ~uniform 
(0-0767-0-1533), i.e. uniform between 4 and 8 weeks. 
Information on the proportion of patients at recruit- 
ment in the GP setting, /v,gp in the LaMontagne 
study who were in the re-infection group reweighted 
to account for disproportionate inclusion into the 
study of initially CT positive women were introduced 
via informative beta priors, derived from table 1 in 
LaMontagne et al. [5]. For example, in women 
aged 16-19 years who were tested at GP clinics, 
663 + 137 = 800 were CT negative, and 45 + 48 were 
CT positive. So the correct weights for the infection 
and re-infection groups are 800/893, and 93/893, 
respectively. We repeated the same calculation for 
women aged 20-24 years, and we assume the weights 
are constant within these two age groups. Although 
testing and treatment every 6 months interferes with 
the natural history, CT-positive women are sub- 
sequently placed in the re-infection group so this 
does not bias the results. 



Information on two parameters, A s and the propor- 
tion of women subject to the re-infections rate, /> a ,GP, 
was introduced via informative priors rather than 
through the data likelihood. This prevents these 'data' 
from contributing directly to the global goodness- 
of-fit assessment. The decision to treat these inputs 
differently was because the source of evidence on the 
first was expert clinical knowledge quite unrelated to 
the other sources of data in the synthesis, while the 
second was local to the LaMontagne study. We were 
therefore interested less in the 'goodness of fit' of this 
information, and more in the goodness of fit of the 
other data, conditional on the priors we assigned to 
these parameters. In addition, we applied the 'cut func- 
tion' to both these parameters, a facility within the 
WinBUGS programming language that prevents infor- 
mation from the rest of the evidence network from 
'updating' priors [10] so in these cases the posterior 
for the parameter is the same as the informative prior. 

Data likelihoods 

The age-specific prevalence data -D a , pop in Table 3 
was given a normal likelihood on the logit scale: 
logit(Z) aiPop )~Af(^ a>pop ,F ajPop ), with the variance cal- 
culated from the 95% CIs. The setting-specific odds 
ratios (OR s ) in Table 4 were handled in the same 
way: logit(OR s )~A^(7 s> , V s ). The data on duration of 
asymptomatic infection (Table 1) was entered as a 
normal likelihood: Dur A ~#(A A , F A ). 

The numbers infected in Table 3 (r) are considered 
as having a binomial likelihood, with parameters k^, 
and denominators also shown in the table so that r a Sii 
~£(/c aSji (0-5), n as i). The number of symptomatic 
infections (r = 26), reported by Geisler et al. [19] is 
binomially distributed with parameter <p and denomi- 
nator 115. 

The WinBUGS code, available as Supplementary 
material, consists of the priors and likelihoods as 
described above, and the functional relationships 
described exactly as in Figure 1 and in the text. 

APPENDIX 2: ASSESSMENT OF 
STATISTICAL MODELLING 
ASSUMPTIONS 

Regression analysis of the LaMontagne data 

We fit the following nine regression models to the 18 
data points from LaMontagne shown in Table 2: 

Model 1: 

l0g(4, s ,i) = a+ya+Ps+^i+yPas+^ai+Wsi+OTasi, 



Table Al. Master list of parameters 



Basic parameters 


Priors 


Interpretation 


Incidence 

^1,1,1 

7a 
Ps 
>h 
Pa,s 


Log-normal 
Log-normal 
Log-normal 
Log-normal 
Beta 


CT infection rate in women aged 16-17 years in the GP setting 
Hazard ratio across age groups 
Hazard ratio across settings 

Setting specific re-infection to infection hazard ratios 
Baseline prevalence of LaMontagne sampling frame 


Duration 

As 
A A 

<t> 

Prevalence 

' L a,pop 


Normal 

Uniform 

Binomial 

^a,pop ^l,pop-7a 


Mean duration of symptomatic CT infection 

Mean duration of asymptomatic CT infection 

Proportion of CT episodes in which symptoms will develop 

Population prevalence of CT by age group 


Functional 

palclIIlcLCIS 


r LlllCUOll 


iiiLcipieiauuii 


Tnpirlpnpf* 

1 1 IVw 1^.1^, 1 1L L 

^a,s,l 


7aPs^lll 


Infection rate for women in age group a and setting s 


^a,s,2 




Re-infection rate for women in age group a and setting s 


iTOI 
A a,GP 


(1 — /'a.sX^a.s.l + Pa,s-^a,s2 


Force of infection for women in age group a in the GP setting 


jTOI 
a, pop 


f pop a, or 


Force of infection for women in age group a in the general population 


JINC1 
a, pop 


)FOI 
Vpop 

i-;. FO ' a 

A a,pop" 


Incidence rate of CT for women in age group a in the general population estimated using method A 


JINC2 
A a,pop 


^a,pop/A 


Incidence rate of CT for women in age group a in the general population estimated using method B 


t \ 

K Wa,s,i 


See Figure 1 


Proportion of CT-negative women in age group a, setting s, and re-infection status i, expected to be CT positive after 
6 months 


Duration 
A 


A s <p+A A (1 - <p) 

1/A A 

1/As 


Mean duration of CT infection 

Mean clearance rate of asymptomatic CT infections 

Mean clearance rate of symptomatic CT infections 


Prevalence 

7T * 
yt a,pop 


^a.pop-A 


Population prevalence of CT by age group 



CT, Chlamydia trachomatis. 

* Prevalence is a basic parameter in method B but a functional parameter in the full synthesis model. 
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Model 2 


log(2 a 


m) 


= a+7a + / 5 s+'/i + y/ 5 as+y'7ai + /"7si, 


Model 3 


log(4 


m) 


= a+y-A+Pv+rii+yrim+prisi, 


Model 4 


log(2 a 


S ,i) 


= a+Yz+ps+rji+ypas+ptisi, 


Model 5 


log(2 a 


..0 


= a+yz+Ps+rn+ypas+yriai, 


Model 6 


log(4 


s,i) 


= a+y a +p s +ri i +yp as , 


Model 7 


log(4, s ,i) 


= a+y a +p s +ri i +yri ai , 


Model 8 


log(4 


s,i) 


= a+y a +p s +n i +pn si , 


Model 9 


log(2 a 


s,i) 


= a+y a +p s +n i , 



where: y u p u t]\, yp a \, yp\ s , yn^ynv^ P*7si, PVn, OTasi, 
ypiaih 7/ ) '7isi = 0; yp a s represents an interaction between 
age and setting, yt] ai between age and re-infection, ptj si 
between setting and re-infection, and yprj. Asi is a three- 
way interaction between age, setting, and re-infection. 

The estimates of l ajSii along with estimates of 
duration feed into equation (8) to estimate K(?)a, s ,i : 
the parameter in the likelihood function for the 
LaMontagne data. Model fit statistics together with 
the nominal numbers of parameters for each model 
are shown in Table A2. Results are based on two 
chains run for 40000 samples after a 10000 burn-in. 
The results show that model 8, which includes only 
the main effects and an interaction between setting 
and infection/re-infection status has the lowest 
Deviance Information Criterion (DIC). The DIC is 
a commonly used statistical measure of model fit 



Table A2. Model fit statistics for each of the regression 
models fitted to the LaMontagne data from Table 2 



Model 


Residual 
deviance 


Nominal number 
of parameters 


DIC 


Pd 


1 


18-6 


18 


103-2 


17-9 


2 


15-8 


14 


96-2 


13-2 


3 


190 


10 


95-8 


10-1 


4 


14-5 


12 


93-1 


11-9 


5 


22-6 


12 


101-1 


11-9 


6 


211 


10 


97-7 


10-0 


7 


24-1 


8 


98-9 


8-0 


8 


17-3 


8 


91-9 


8-0 


9 


22-3 


6 


95-0 


6-0 



DIC, Deviance Information Criterion; p D , effective number 
of parameters [22]. 



which penalizes more complex models [22], A plot 
of the deviance residuals for model 8 showed no pro- 
blems (not shown). Model 8 is identical to the one 
described in the main text although it has been 
re-parameterized slightly to simplify the notation. 
It is only marginally better than a model that also 
includes an interaction between age and setting 
(model 4), or a model that assumes no interactions 
(model 9). 

Assumed relationship between setting-specific odds 
ratios from Adams and hazard ratios in the model 
for the LaMontagne data 

We assume that the between-setting odds ratios are 
equivalent to between-setting relative risks due to 
the rare disease assumption and that these inform 
the between-setting hazard ratios in LaMontagne. 
This is not strictly correct as they should inform the 
between-setting incidence ratios. 

We assess the sensitivity of the results to this 
assumption. Table A3 shows the between-setting 
infection (INF) ratios (column 1), FOI ratios 
(column 2), and incidence (INC) ratios (column 3), 
estimated from the LaMontagne data alone. The 
corresponding results from Adams (introduced in 
table 4) are repeated in column 4. The INF ratios 
from LaMontagne are almost identical to the odds 
ratios from Adams. However, this is not a reason to 
conclude that our model is better than the 'correct' 
model where they inform INC ratios. There is some 
discrepancy between the INF ratios compared to the 
INC or FOI ratios. The FOI and INC ratios are 
almost identical. 

Because of the lack of data for the general 
population in LaMontagne it is incredibly difficult 
to correctly parameterize the model so that the 
ORs inform the INC ratios. It is however possible, 
although considerably more mathematically compli- 
cated than the model described in this paper, to apply 
the odds ratios to FOI ratios. We performed this 



Table A3. Between-setting ratios of infection (INF) rates, force of infection (FOI) rates, and incidence (INC) rates 
estimated for the LaMontagne [5] data alone, along with odds ratios reported in Adams [6] 





LaMontagne 


LaMontagne 


LaMontagne 


Adams odds ratios 


Ratio 


INF ratios 


FOI ratios 


INC ratios 


(repeated from table 4) 


FP to GP 


1-27 (0-59-2-48) 


1-10 (0-65-1-74) 


1-11 (0-61-1-88) 


1-27 (1-00-1-62) 


STD to GP 


2-31 (l-ll^t-45) 


1-60 (0-95-2-53) 


1-74 (0-94-2-97) 


2-39 (0-72-3-33) 



FP, Family planning; GP, general practitioner; STD, sexually transmitted disease clinic. 
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analysis for the full synthesis model and found that 
incidence changed by less than a multiplicative factor 
of 5% in all age groups (mean <3%). From this, and 
the fact that the INC and FOI ratios agree so closely, 
we conclude that there is only very negligible bias 
from not parameterizing the model so that odds ratios 
inform INC ratios. 

SUPPLEMENTARY MATERIAL 

For supplementary material accompanying this paper 
visit http://dx.doi.org/10.1017/S0950268813001027. 
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